Are rule-based syllabification methods adequate for languages with low syllabic complexity? the case of Italian

نویسندگان

  • Connie R. Adsett
  • Yannick Marchand
چکیده

Syllabification information is a valuable component in speech synthesis systems. Linguistic rule-based methods have been assumed to be the best technique for determining the syllabification of unknown words. This has recently been shown to be incorrect for the English language where data-driven algorithms have been shown to outperform rule-based methods. It may be possible, however, that data-driven methods are only better for languages with complex syllable structures. In this paper, three rule-based automatic syllabification systems are compared and two data-driven (Syllabification by Analogy and the Look-Up Procedure) on a language with lower syllabic complexity Italian. Using a leave-one-out procedure on 44,720 words, the best data-driven algorithm (Syllabification by Analogy) achieved 97.70% word accuracy while the best rule-based method correctly syllabified 89.77% words. These results show that data-driven methods can also outperform rule-based methods on Italian syllabification, indicating that these may be the best approaches to the syllabification component of speech synthesis systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Syllabification rules versus data-driven methods in a language with low syllabic complexity: The case of Italian

Linguistic rules have been assumed to be the best technique for determining the syllabification of unknown words. This has recently been challenged for the English language where data-driven algorithms have been shown to outperform rule-based methods. It may be possible, however, that data-driven methods are only better for languages with complex syllable structures. In this study, three rule-b...

متن کامل

Imdlawn Tashlhiyt Berber Syllabification is Quantifier-Free∗

Imdlawn Tashlhiyt Berber (ITB) is unusual due to its tolerance of non-vocalic syllabic nuclei. Rule-based and constraint-based accounts of ITB syllabification do not directly address the question of how complex the process is. Model theory and formal logic allow for comparison of complexity across different theories of phonology by identifying the computational power (or expressivity) of lingui...

متن کامل

Automatic syllabification for danish text-to-speech systems

In this paper, a rule-based automatic syllabifier for Danish is described using the Maximal Onset Principle. Prior success rates of rule-based methods applied to Portuguese and Catalan syllabification modules were on the basis of this work. The system was implemented and tested using a very small set of rules. The results gave rise to 96.9% and 98.7% of word accuracy rate, contrary to our initi...

متن کامل

Investigating the missing data effect on credit scoring rule based models: The case of an Iranian bank

Credit risk management is a process in which banks estimate probability of default (PD) for each loan applicant. Data sets of previous loan applicants are built by gathering their data, and these internal data sets are usually completed using external credit bureau’s data and finally used for estimating PD in banks. There is also a continuous interest for bank to use rule based classifiers to b...

متن کامل

Sylli: Automatic Phonological Syllabification for Italian

We will present a complete syllabifier for Italian (Sylli), that is based on phonological principles, flexible and easy to adapt for other uses, alphabets and languages. Crucial concepts regarding syllabification principles in modern phonological theory will be discussed (§1.1); specific issues concerning Italian syllabification will then be summarised (§1.2) and an overview of the available au...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007